Useful DESeq tutorials:
Useful fGSEA tutorials:
The experimental samples consist of 72 mice. 18 mice were WT SPF, 18 were KO SPF, 18 were WT GF, and 18 were KO GF. At every 4 hours, starting from ZT2, 3 mice from each group were sac’d and had liver RNA content extracted. In this dataset are all the mice that were sac’d in the first timepoint, i.e. 3 WTSPF, 3 KOSPF, etc.
This data was generated by first running STAR alignment on the Midway server on the sequencing results. A featurecounts table was generated, which was filtered using a custom python script to separate data by timepoint.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("apeglm")
BiocManager::install("DESeq2")
BiocManager::install("genefilter")
BiocManager::install("AnnotationHub")
BiocManager::install("org.Mm.eg.db")
BiocManager::install("GO.db")
BiocManager::install("biomaRt")
BiocManager::install("vsn")
install.packages(c("hexbin","feather","RColorBrewer","viridis","pheatmap"))
library(feather)
library(DESeq2)
library(dplyr)
library(magrittr)
library(genefilter)
library(AnnotationHub)
library(org.Mm.eg.db)
library(GO.db)
library(vsn)
library(pheatmap)
library(biomaRt)
library(curl)
library(RColorBrewer)
library(viridis)
Grab a list of all mouse protein coding genes from ensemble, read in RNA Seq data, and filter by protein coding genes.
mouseProteinCodingGenes <- useMart("ensembl") %>%
useDataset("mmusculus_gene_ensembl",mart=.) %>%
getBM(attributes=c("ensembl_gene_id","external_gene_name","description"), filters='biotype', values=c('protein_coding'), mart=.)
df <- read_feather("time_1_2.feather") %>%
.[rowSums(sapply(., '%in%', mouseProteinCodingGenes$ensembl_gene_id)) > 0,]
df <- set_rownames(df, df$Geneid) %>% dplyr::select(-Geneid)
DESeq requires a leveled dataframe containing the metadata. This dataframe must be in the same order as the samples in the featurecounts data. DESeq will use this dataframe to generate the analysis.
For this dataset, this is what the condition list looked like:
| Genotype | Condition | |
|---|---|---|
| Sample1 | WT | SPF |
| Sample2 | WT | SPF |
| Sample3 | WT | SPF |
| Sample4 | KO | SPF |
| Sample5 | KO | SPF |
| Sample5 | KO | SPF |
| Sample6 | WT | GF |
| Sample7 | WT | GF |
| Sample8 | WT | GF |
| Sample9 | KO | GF |
| Sample10 | KO | GF |
| Sample11 | KO | GF |
gt_list <- rep(c(rep("WT",3), rep("KO",3)),2)
cond_list <- c(rep("SPF",6),rep("GF",6))
condition_list <- data.frame(row.names=colnames(df), Genotype=gt_list, Condition=cond_list)
condition_list$Genotype <- relevel(condition_list$Genotype, "WT")
condition_list$Condition <- relevel(condition_list$Condition, "SPF")
Generate DESeqDataSet (dds) object from the data and metadata
dds <- DESeq2::DESeqDataSetFromMatrix(countData=df,
colData = condition_list,
design = ~ Genotype + Condition + Genotype:Condition)
Take a look at the dds object
head(dds)
class: DESeqDataSet
dim: 6 12
metadata(1): version
assays(1): counts
rownames(6): ENSMUSG00000051951 ENSMUSG00000025902 ...
ENSMUSG00000033813 ENSMUSG00000002459
rowData names(0):
colnames(12): KF_01 KF_02 ... KF_56 KF_57
colData names(2): Genotype Condition
We can visualize the dds object by variance stabilizing transformations (vst) and normalized transformed data (ntd), which we’ll later use to generate heatmaps and PCAs.
First, vsd
vsd <- vst(dds, blind = FALSE)
head(assay(vsd), 3)
KF_01 KF_02 KF_03 KF_19 KF_20
ENSMUSG00000051951 5.586513 5.586513 5.586513 5.777269 5.864151
ENSMUSG00000025902 6.648029 6.628288 6.679868 6.732869 6.339456
ENSMUSG00000033845 10.101805 10.138352 10.133246 10.064316 10.390333
KF_21 KF_37 KF_38 KF_39 KF_55
ENSMUSG00000051951 5.857618 5.586513 5.586513 5.586513 5.586513
ENSMUSG00000025902 6.873506 6.462250 6.277894 6.435737 6.185492
ENSMUSG00000033845 10.283528 10.641011 10.448340 10.612897 10.698524
KF_56 KF_57
ENSMUSG00000051951 5.586513 5.586513
ENSMUSG00000025902 6.314729 6.181333
ENSMUSG00000033845 10.434089 10.500890
sizeFactors for samples from vsd. A sizeFactor is a normalization value, count values/sizeFactor yields a common scale.
colData(vsd)
DataFrame with 12 rows and 3 columns
Genotype Condition sizeFactor
<factor> <factor> <numeric>
KF_01 WT SPF 1.06581143642804
KF_02 WT SPF 1.10841342430678
KF_03 WT SPF 1.00191112543556
KF_19 KO SPF 1.18865228958857
KF_20 KO SPF 1.12039876331493
... ... ... ...
KF_38 WT GF 0.977919662514898
KF_39 WT GF 0.875355342841283
KF_55 KO GF 0.95209046510223
KF_56 KO GF 0.959622775650204
KF_57 KO GF 0.965642851429522
Standard deviation plotted against rank means. A horizontal line indicates there is no dependence of standard deviation to the means, which is ideal.
meanSdPlot(assay(vsd))
And the same for ntd
ntd <- normTransform(dds)
meanSdPlot(assay(ntd))
vsd yields better results based off the meanSdPlot.
We can check a PCA of the transformed data
plotPCA(vsd, intgroup=c("Genotype","Condition"))
Gratifyingly, PCA agnostically separates out all four experimental groups, validating that they are indeed 4 different and discrete groups, with transcriptomic variation.
At this point, normalizing looks fine and the data looks fine, so deseq can be run.
dds is normalized using estimateSizeFactors
dds <- estimateSizeFactors(dds)
colData(dds)
DataFrame with 12 rows and 3 columns
Genotype Condition sizeFactor
<factor> <factor> <numeric>
KF_01 WT SPF 1.06581143642804
KF_02 WT SPF 1.10841342430678
KF_03 WT SPF 1.00191112543556
KF_19 KO SPF 1.18865228958857
KF_20 KO SPF 1.12039876331493
... ... ... ...
KF_38 WT GF 0.977919662514898
KF_39 WT GF 0.875355342841283
KF_55 KO GF 0.95209046510223
KF_56 KO GF 0.959622775650204
KF_57 KO GF 0.965642851429522
I create a “Group” column that contains both Genotype vs Condition in one column
dds$Group <- factor(paste0(dds$Genotype, dds$Condition))
dds$Group <- relevel(dds$Group, "WTSPF")
design(dds) <- ~ Group
dds <- DESeq(dds)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
I’ll compare all contrasts groups using DESeq’s results function, explicitly telling it to compare items based on the dds$group column, followed by the ‘control’ vs ‘experimental’ group in that comparison. In this experimental setup, it is slightly confusing since there are 4 groups being compared. In a typical treated vs untreated experiment, there would be no need to look at all the different types of comparisons made here.
#Compare SPF v GF in WT
WT_res <- results(dds,contrast=c("Group", "WTSPF", "WTGF"), tidy = TRUE)
#Compare SPF v GF in KO
KO_res <- results(dds, contrast=c("Group", "KOSPF", "KOGF"), tidy = TRUE)
#Compare WT v KO in SPF Conditions
SPF_res <- results(dds, contrast=c("Group", "WTSPF", "KOSPF"), tidy = TRUE)
#Compare WT v KO in GF Conditions
GF_res <- results(dds, contrast=c("Group", "WTGF", "KOGF"), tidy = TRUE)
summary(WT_res)
row baseMean log2FoldChange lfcSE
Length:17677 Min. : 0.0 Min. :-8.9255 Min. :0.0712
Class :character 1st Qu.: 2.9 1st Qu.:-0.2237 1st Qu.:0.1532
Mode :character Median : 96.8 Median : 0.0242 Median :0.2356
Mean : 1088.0 Mean : 0.0983 Mean :0.8129
3rd Qu.: 493.8 3rd Qu.: 0.3746 3rd Qu.:0.7417
Max. :1261675.8 Max. : 4.9813 Max. :4.4074
NA's :1283 NA's :1283
stat pvalue padj
Min. :-15.4830 Min. :0.0000 Min. :0.000
1st Qu.: -0.8081 1st Qu.:0.0799 1st Qu.:0.170
Median : 0.0966 Median :0.3670 Median :0.499
Mean : -0.0024 Mean :0.4122 Mean :0.487
3rd Qu.: 0.9702 3rd Qu.:0.7175 3rd Qu.:0.795
Max. : 11.6147 Max. :1.0000 Max. :1.000
NA's :1283 NA's :1286 NA's :5084
We can check the number of significant results and false discovery rates (multiple hypothesis corrections)
print(cat(sum(WT_res$pvalue < 0.05, na.rm=TRUE), # How many results had p val < 0.05
sum(!is.na(WT_res$pvalue)),
sum(WT_res$padj < 0.1, na.rm=TRUE), # How many results had a p adj val < 0.1
"\n"))
3441 16391 2413
NULL
print(cat(sum(KO_res$pvalue < 0.05, na.rm=TRUE),
sum(!is.na(KO_res$pvalue)),
sum(KO_res$padj < 0.1, na.rm=TRUE),
"\n"))
3464 16391 2409
NULL
print(cat(sum(SPF_res$pvalue < 0.05, na.rm=TRUE),
sum(!is.na(SPF_res$pvalue)),
sum(SPF_res$padj < 0.1, na.rm=TRUE),
"\n"))
2671 16391 1576
NULL
print(cat(sum(GF_res$pvalue < 0.05, na.rm=TRUE),
sum(!is.na(GF_res$pvalue)),
sum(GF_res$padj < 0.1, na.rm=TRUE),
"\n"))
1938 16391 955
NULL
We can check shrunken log fold change (lfc) and check distribution of said lfc using lfcShrink. Possible values for coef can be seen using resultsNames(dds). lfc requires a comparison between 2 groups, for obvious reasons, so only 2 groups can be compared at a time.
condition_LFC <- lfcShrink(dds, coef = "Group_KOSPF_vs_WTSPF", type = 'apeglm')
using 'apeglm' for LFC shrinkage. If used in published research, please cite:
Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
sequence count data: removing the noise and preserving large differences.
Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
A MA-plot can be generated. This is a plot of log2foldchange (determined by lfcShrink) and mean of normalized counts.
p <- DESeq2::plotMA(condition_LFC, ylim=c(-2,2))
Fold change can be visualized using some heatmaps. ensemble IDs are converted to gene names using the mouseProteinCodingGenes table generated from biomaRt earlier. mat is the dataframe used to generate the heatmap, and anno is used for pheatmap’s annotation
number_of_genes <- 50 # change this to the number of genes you want to be generated in the heatmap
topVarGenes <- head(order(rowVars(assay(vsd)), decreasing = TRUE), number_of_genes)
mat <- assay(vsd)[ topVarGenes, ]
mat <- mat - rowMeans(mat)
geneNames <- data.frame(
row.names(mat)[match(mouseProteinCodingGenes$ensembl_gene_id, row.names(mat))],
mouseProteinCodingGenes$external_gene_name) %>%
na.omit() %>%
set_colnames(c('Geneid','external_gene_name')) %>%
.[match(row.names(mat),.$Geneid),]
mat <- set_rownames(mat,make.names(geneNames$external_gene_name, unique = T))
anno <- as.data.frame(colData(vsd)[, c("Genotype","Condition")])
Inferno theme heatmap with unclustered columns
# Inferno theme heatmap with unclustered columns
p1 <- pheatmap(mat, color = inferno(length(mat) - 1), annotation_col = anno, cluster_cols = F)
p1
Classic theme heatmap, columns and rows are clustered via pheatmap
# Classic theme heatmap, columns and rows are clustered via pheatmap
p2 <- pheatmap(mat, annotation_col = anno)
p2
Inferno theme heatmap with clustered rows and columns via pheatmap
# Inferno theme heatmap with clustered rows and columns via pheatmap
p3 <- pheatmap(mat, color = inferno(length(mat) - 1), annotation_col = anno)
p3
This is a heatmap of sample-to-sample distance. It shows how similar genotype-condition samples are to each other based on transformed data.
sampleDists <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(vsd$Genotype, vsd$Condition, sep="-")
colnames(sampleDistMatrix) <- paste(vsd$Genotype, vsd$Condition, sep="-")
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows=sampleDists,
clustering_distance_cols=sampleDists,
col=colors)
GSEA works by having a method to rank genes in a list of genes. Some method of ranking must be used, or GSEA will not know how to decide if a pathway is enriched or not. We’ll use the Wald statistic (a ratio of LFC to p val) as a hypothesis-generating ranking measurement, although other statistics from DESeq can be used (e.g. LFC or p adj alone), depending on the desired outcome of pathway analysis.
BiocManager::install("fgsea")
install.packages(c("tidyverse","ggpubr"))
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(fgsea)
library(tidyverse)
library(ggpubr)
theme_set(theme_pubr())
Genesets are essentially data structures that contain some grouping of genes, e.g. pathways or locations, and a list of genes for that group. Pathways are downloaded from gsea-msigdb. For this notebook, I put them on gdrive to url request, but otherwise the pathways should be downloaded from there. The file name will include how the genes are identified - here we will use ‘symbols,’ since STAR alignment gave us ensembl IDs.
download.file("https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://drive.google.com/uc%3Fexport%3Ddownload%26id%3D133tI8ocIh2q0MsW51jerDePEnDH71Gcs&service=writely",
destfile = "c2.cp.kegg.v7.0.symbols.gmt")
download.file("https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://drive.google.com/uc%3Fexport%3Ddownload%26id%3D1WpWISZeUUjRI6BlUKmUcJummzTxxTE8y&service=writely",
destfile = "c5.all.v7.0.symbols.gmt")
download.file("https://drive.google.com/uc?export=download&id=1NfI1jBJkNROCDwdXe92nr3d7cffUw-fy",
destfile = "c5.bp.v7.0.symbols.gmt")
download.file("https://drive.google.com/uc?export=download&id=1WydckDxDgehnahCwHsqKhqUAdHCqRE4h",
destfile = "h.all.v7.0.symbols.gmt")
fgsea_output <- list() # This will become a datastructure that will hold multiple fGSEA results
fgsea_output_plots <- list()
bm <- getBM(attributes=c("ensembl_gene_id", "hsapiens_homolog_associated_gene_name"), mart=ensemblMouse) %>%
distinct() %>%
as_tibble() %>%
na_if("") %>%
na.omit()
We’ll compare pathways in the same way we generated our differential analysis comparisons, e.g. the 4-way analysis we ran previously. In other words, we can pull the Wald statistic from the 4 results we recieved from DESeq to generate ranked lists for different pathway analysis comparisons.
We also need to consider the pathway genesets we use. For WT SPF, I’ll run pathway analysis using Hallmark, KEGG, Gene Ontology with all tags, and Gene Ontology with only Biological Processess tags.
For the sake of the tutorial, I’ll only run all databases for WT_res. I’ll run KEGG and GO for KO_res, then compare results for WT and KO to determine what pathways are being expressed depending on the genotype that vary by microbial status, or are independent of microbial status.
First I need to generate the ranked list. I build res2, a ranked gene list, based on WT_res. I will use this to run fGSEA using each of the different pathways.
res2 <- inner_join(WT_res, bm, by=c("row"="ensembl_gene_id")) %>%
dplyr::select(hsapiens_homolog_associated_gene_name, stat) %>%
na.omit() %>%
distinct() %>%
group_by(hsapiens_homolog_associated_gene_name) %>%
summarize(stat=mean(stat))
Take a brief look at the ranks generated from WT_res
ranks <- deframe(res2)
head(ranks, 20)
A1BG A1CF A2M A3GALT2 A4GALT A4GNT AAAS
-0.45147358 1.03819985 1.36453849 0.19675448 1.05196147 -0.28252894 0.51368498
AACS AADAC AADAT AAGAB AAK1 AAMDC AAMP
0.10147056 -3.04044117 -3.57487140 -3.79072943 2.78468647 -2.25750312 -0.97911398
AANAT AAR2 AARD AARS1 AARS2 AARSD1
0.15390472 -0.77264471 -0.47761889 -0.63080907 -0.48650356 0.08451519
I use fgsea() with nperm set to 100000 based on earlier testing
fgseaResTidy <- fgsea(pathways=gmtPathways("h.all.v7.0.symbols.gmt"), stats=ranks, nperm=10000) %>%
as_tibble() %>%
filter(padj < 0.05) %>%
arrange(desc(NES))
Results can be seen in a table
fgseaResTidy %>%
dplyr::select(-leadingEdge, -ES, -nMoreExtreme) %>%
arrange(padj)
Finally, we can see how pathways are enriched or not enriched in the SPF condition compared to the GF condition for WT mice.
fgsea_output_plots$WT$hallmark <- ggplot(fgseaResTidy, aes(reorder(pathway, NES), NES)) +
geom_col() +
coord_flip() +
labs(x="Pathway", y="Normalized Enrichment Score",
title="WT SPF vs GF: Hallmark pathways NES from GSEA")
fgsea_output_plots$WT$hallmark
We’ll go ahead and run this pipe with the remaining pathway databases. I add all the fgsea results to a datastructure fgsea_output and plots to a datastructure fgsea_output_plots.
fgsea_output$WT$KEGG <- fgsea(pathways=gmtPathways("c2.cp.kegg.v7.0.symbols.gmt"), ranks, nperm=10000) %>%
as_tibble() %>%
filter(padj < 0.05) %>%
arrange(padj)
fgsea_output_plots$WT$KEGG <- ggplot(fgsea_output$WT$KEGG, aes(reorder(pathway, NES), NES)) +
geom_col() +
coord_flip() +
labs(x="Pathway", y="Normalized Enrichment Score",
title="WT SPF vs GF: KEGG pathways NES from GSEA")
fgsea_output$WT$GO_BP <- fgsea(pathways=gmtPathways("c5.bp.v7.0.symbols.gmt"), ranks, nperm=10000) %>%
as_tibble() %>%
filter(padj < 0.05) %>%
arrange(padj)
fgsea_output_plots$WT$GO_BP <- ggplot(fgsea_output$WT$GO_BP, aes(reorder(pathway, NES), NES)) +
geom_col() +
coord_flip() +
labs(x="Pathway", y="Normalized Enrichment Score",
title="WT SPF vs GF: GO biological processes pathways NES from GSEA")
Now to create a gene list using KO DESeq results
res2 <- inner_join(KO_res, bm, by=c("row"="ensembl_gene_id")) %>%
dplyr::select(hsapiens_homolog_associated_gene_name, stat) %>%
na.omit() %>%
distinct() %>%
group_by(hsapiens_homolog_associated_gene_name) %>%
summarize(stat=mean(stat))
ranks <- deframe(res2)
head(ranks, 20)
A1BG A1CF A2M A3GALT2 A4GALT A4GNT AAAS
-1.14384169 3.50054410 -0.82675569 0.62891093 0.64358223 -0.28076222 -0.52790413
AACS AADAC AADAT AAGAB AAK1 AAMDC AAMP
1.21589930 -1.94850059 -6.49019486 -3.80727864 0.28445488 -4.75224452 -1.92405882
AANAT AAR2 AARD AARS1 AARS2 AARSD1
0.00000000 -1.03778726 -0.01154298 -1.72596764 0.95231562 1.10142757
And subsequently run fGSEA
fgsea_output$KO$KEGG <- fgsea(pathways=gmtPathways("c2.cp.kegg.v7.0.symbols.gmt"), ranks, nperm=10000) %>%
as_tibble() %>%
filter(padj < 0.05) %>%
arrange(padj)
fgsea_output_plots$KO$KEGG <- ggplot(fgsea_output$KO$KEGG, aes(reorder(pathway, NES), NES)) +
geom_col() +
coord_flip() +
labs(x="Pathway", y="Normalized Enrichment Score",
title="KO SPF vs GF: Kegg pathways NES from GSEA")
fgsea_output$KO$GO_BP <- fgsea(pathways=gmtPathways("c5.bp.v7.0.symbols.gmt"), ranks, nperm=10000) %>%
as_tibble() %>%
filter(padj < 0.05) %>%
arrange(padj)
fgsea_output_plots$KO$GO_BP <- ggplot(fgsea_output$KO$GO_BP, aes(reorder(pathway, NES), NES)) +
geom_col() +
coord_flip() +
labs(x="Pathway", y="Normalized Enrichment Score",
title="KO SPF vs GF: GO biological processes pathways NES from GSEA")
f1 <- ggarrange(fgsea_output_plots$WT$KEGG, fgsea_output_plots$KO$KEGG,
labels = c("WT KEGG", "KO KEGG"))
annotate_figure(
f1,
top = text_grob("\nWT and KO SPF vs GF KEGG pathways\n",
color = "red", face = "bold", size = 20)
)
A positive normalized enrichment score (NES) means more genes expressed for this pathway in SPF, and the pathway is enriched in SPF, a negative score means more genes expressed for this pathway in GF.
We can see what pathways showed up in both WT and KO analyses above. I’ll check KEGG and GO databases and show results only for pathways in both WT or KO conditions.
This method of combining results can be expanded using basic set theory. I am only doing an intersection in this excersize.
fgsea_output$WT_KO$KEGG <- inner_join(fgsea_output$WT$KEGG, fgsea_output$KO$KEGG, by = "pathway", suffix = c(".WT", ".KO"))
fgsea_output_plots$WT_KO$KEGG <- data.frame(pathway = fgsea_output$WT_KO$KEGG$pathway, "WT NES" = fgsea_output$WT_KO$KEGG$NES.WT, "KO NES" = fgsea_output$WT_KO$KEGG$NES.KO) %>%
reshape2::melt(id="pathway") %>%
ggplot(aes(x = reorder(pathway, -value), y = value, fill = variable)) +
geom_col() +
coord_flip() +
facet_wrap(~ variable, labeller = labeller(variable = c("WT.NES" = "WT SPF vs GF", "KO.NES" = "KO SPF vs GF"))) +
xlab("Pathway") +
ylab("Normalized Enrichment Score") +
ggtitle("\nSPF vs GF pathway enrichment\n") +
theme(legend.position = "none", plot.title = element_text(size = 20, color = "red", face = "bold"))
fgsea_output_plots$WT_KO$KEGG
fgsea_output$WT_KO$GO_BP <- inner_join(fgsea_output$WT$GO_BP, fgsea_output$KO$GO_BP, by = "pathway", suffix = c(".WT", ".KO"))
fgsea_output_plots$WT_KO$GO_BP <- data.frame(pathway = fgsea_output$WT_KO$GO_BP$pathway, "WT NES" = fgsea_output$WT_KO$GO_BP$NES.WT, "KO NES" = fgsea_output$WT_KO$GO_BP$NES.KO) %>%
reshape2::melt(id="pathway") %>%
ggplot(aes(x = reorder(pathway, -value), y = value, fill = variable)) +
geom_col() +
coord_flip() +
facet_wrap(~ variable, labeller = labeller(variable = c("WT.NES" = "WT SPF vs GF", "KO.NES" = "KO SPF vs GF"))) +
xlab("Pathway") +
ylab("Normalized Enrichment Score") +
ggtitle("\nSPF vs GF GO BP pathway enrichment\n") +
theme(legend.position = "none", plot.title = element_text(size = 20, color = "red", face = "bold"))
fgsea_output_plots$WT_KO$GO_BP
Thus, varying microbial condition between WT and L-Bmal1-KO at ZT2 did not lead to different pathway expression in the mice.
Further analysis includes looking at how varying genotype leads to differences between SPF and GF conditions at multiple timepoints.
This completes the tutorial for running differential analysis followed by pathway analysis using DESeq2 and fGSEA.
smanzoor@uchicago.edu